LIA at TREC 2012 Web Track: Unsupervised Search Concepts Identification from General Sources of Information
نویسندگان
چکیده
In this paper, we report the experiments we conducted for our participation to the TREC 2012 Web Track. We experimented a brand new system that models the latent concepts underlying a query. We use Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. Our approach automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without any prior training step. These concepts are incorporated into the ranking function with the aim of promoting documents that refer to many different query-related thematics. We also explored the use of different types of sources of information for modeling the latent concepts. For this purpose, we use four general sources of information of various nature (web, news, encyclopedic) from which the feedback documents are extracted.
منابع مشابه
LIA-iSmart at TREC 2010: An Unsupervised Web-Based Approach for Filtering Answers
Searching for named entities has been the subject of many researches in information retrieval. Our goal in participating in TREC 2010 Entity Ranking track is to look for reconizing any named entity in arbitrary categories and use this to rank candidate named entities. We propose to address the issue by means of a web oriented language modeling approach.
متن کاملLIA at TREC 2011 Web Track: Experiments on the Combination of Online Resources
In this paper, we report the experiments we conducted for our participation to the TREC 2011 Web Track. The experiments we conducted this year aim at discovering how the combination of specific external resources in a language modeling fashion can help web search. We use Wikipedia and Google as external resources for different search contexts.
متن کاملQuantification et identification des concepts implicites d'une requête
In this paper we introduce an unsupervised method for mining and modeling latent search concepts. We use Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. Our approach automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without a...
متن کاملWIDIT in TREC 2004 Genomics, Hard, Robust and Web Tracks
To facilitate understanding of information as well as its discovery, we need to combine the capabilities of the human and the machine as well as multiple methods and sources of evidence. Web Information Discovery Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science houses several projects that aim to apply this idea of multi-level fusion in the areas of in...
متن کاملMicroblog Search and Filtering with Time Sensitive Feedback and Thresholding bsed on BM25
Microblogs such as Twitter are considered faster first-hand sources of information with many real-time fashions. We report our work in the real-time adhoc search and filtering tasks of TREC 2012 microblog track. Our system is built based on the traditional BM25 relevance model, in which specific techniques are tried out to respond to the need of finding relevant tweets. In the real-time adhoc t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012